← back to to covid19-explorer.org

Preamble

This document contains a detailed methodology for the R package covid19.Explorer and shiny-based website covid19-explorer.org. For more details, please check out the following pre-print on medRxiv:

Revell, L. J. covid19.Explorer: A web application and R package to explore United States COVID-19 data. medRxiv pre-print doi: https://doi.org/10.1101/2021.02.15.21251782.

To see a version of this document with the code chunks visible, click here.

In this package & web application we estimate total infections (from deaths & confirmed cases), graph the distribution of infections over time among U.S. states, visualize COVID-19 deaths by age & by sex (as well as compared to deaths from all causes), and analyze excess mortality through time in 2020 as a function of of age & state or jurisdiction.

Virtually all of our data come from the United States Centers for Disease Control and Prevention National Center for Health Statistics or the U.S. Census Bureau. Unfortunately, at this time we can only offer detailed analysis of United States COVID-19 and excess mortality data. It would be to our great interest to compare not only U.S. states & jurisdictions, but also foreign countries; however, the wide variety of different ways in which these data are recorded and reported, as well as variation in data quality between regions, make this endeavor entirely beyond the scope of our project here.

Using this page

The purpose of this project is help users better understand the 2020 COVID-19 pandemic in the United States. Users from other countries might also be interested in the project, for instance because the age distribution of mortality has been broadly similar among affected areas.

This project features two main types of web application.

The first of these (exemplified by the U.S. COVID-19 infections, Iceberg plot, State comparison, Plausible range, and Infection estimator tabs) consists of applications that are designed to estimate the true number of COVID-19 infections over the course of the pandemic. For reasons that we elaborate below, confirmed COVID-19 cases underestimate the true number of infections, usually by a substantial margin. Since there are a variety of reasons that the true number of infections (rather than simply the number of confirmed cases) is of interest, these applications are designed to help users apply a model to estimate the daily number of new infections, the plausible range of new infections, the cumulative number of infections, or the daily or cumulative infections as a percentage or per 1M population.

Each of these applications uses a model - but it’s one that needs to be parameterized by the user. Users parameterize the model by specifying a value or set of values for the infection fatality ratio of COVID-19, as well as an average lag time from infection to death. The values for each of these model parameters have been set to default to fairly reasonable numbers, as detailed in the sections below; however, users are nonetheless strongly encouraged to apply multiple values and examine the sensitivity of their results. In fact, this is one of the main purposes of the project!

The second type of application (exemplified by the tabs Deaths by age, Excess mortality by age, and By state) do not employ a model and exist primarily to permit the user to interact with CDC COVID-19 death and 2020 excess mortality data, to understand the implications of these data, and to generate interesting or useful data visualizations.

Detailed methodology for each type of analysis in this project is given in the sections below.

Estimating infections

Since the beginning of this pandemic, it’s been widely understood that confirmed COVID-19 cases underestimate the true number of infections, sometimes by a very large margin.

This is likely due to multiple factors. One factor is that there has been limited testing capacity throughout the COVID-19 pandemic in the United States, but particularly when the pandemic was in its earliest days. A second significant factor affecting the disconnect between observed cases and true infections is the fact that testing is voluntary, population surveillance testing rare, and many cases of SARS-CoV-2 infection present asymptotically or with mild symptoms.

As such, we consider confirmed COVID-19 deaths to be a much more reliable indicator of disease burden than confirmed cases; however, deaths are a lagging indicator of infections.

Consequently, we chose to use a model in which the number of new infections on day i was estimated by taking the number of observed COVID-19 deaths on day i + k and then dividing this quantity by the infection fatality ratio (also called the infection fatality rate or IFR). In other words, given 50 COVID-19 deaths on day i + k, and an IFR of 0.5%, we would predict that 10,000 new SARS-CoV-2 infections had occurred on day i.

Both k, the average lag time from infection to death (in cases of SARS-CoV-2 infections resulting in death), and the IFR are to be specified by the user.

A fairly reasonable lag time between infection & death is approximately three weeks. For example, during a large outbreak in Melbourne, Australia the time difference between the peak recorded cases and peak confirmed COVID-19 deaths was ~17 days. (Infected persons normally test negative for the first few days following infection, so this more or less corresponds with a lag-time of three weeks.)

Likewise, IFR values ranging from about 0.3% to >1.0% have been reported over the course of the pandemic.

For instance, a study based on an early, super-spreader event in Germany estimated an IFR (corrected to the demographic distribution of the local population) of 0.36%. Other studies have reported higher estimated IFR. For example, an Italian study estimated an overall infection fatality rate of 1.31%.

In general, it’s reasonable to suppose that IFR has fallen through time as treatment of severely ill patients has improved. Likewise, even within the U.S., IFR is unlikely to be the same at a given date in different jurisdictions, due to differences in demographic structure between areas.

We think it is reasonable to suppose an IFR no greater than 1.0% and declining from the start of the pandemic towards the present, with a current IFR of around 0.3%. We permit the user to specify a time-varying IFR by fixing the IFR at each quarter and then interpolating daily IFR between each quarter using local regression smoothing (LOESS).

Reporting can vary through time including regularly over the course of the week. (For instance, fewer COVID-19 deaths tend to be reported on the weekends compared to Monday through Friday.) To take these reporting artifacts into account we used both moving averaging and local regression (LOESS) smoothing. Both the window for the moving average and the LOESS smoothing parameter are controlled by the user.

The approach of using only confirmed COVID-19 deaths - though robust - does not permit us to estimate the true number of infections between now - k days and the present. For this, we assumed a sigmoidal relationship between time and the ratio of confirmed cases over the true number of infections.

Since the true number of new infections cannot exceed the number of confirmed cases, logic dictates that this ratio must have a value between 0 and 1.

We decided on a sigmoidal relationship because we assume the ratio was low early in the pandemic when confirming a new infection was limited primarily by testing capacity, but has probably risen (in many localities) to a more or less consistent value as test capacity increased. Since getting tested is voluntary, and since many infections of SARS-CoV-2 are asymptomatic or only mildly symptomatic, it is unlikely to rise to near 1.0 in the U.S. regardless of testing.

Figure 1 below shows daily estimated infections (under our model) / confirmed cases for all U.S. data over the entire course of the pandemic to date.

Although the fit of the sigmoid function is relatively good, readers might note that the fitted curve seems to overestimate the ratio of infections / confirmed cases infections near the start of the pandemic (i.e., to the left-hand side of the plot). This is not particularly worrisome because the most important part of the graph is the right side - since this is the part that will be use to extrapolate recent infection numbers from observed cases.

library(covid19.Explorer)
data(Data)
infection.estimator(data=Data,
    state="United States",
    ifr=c(0.015,0.01,0.007,0.006,0.006,0.006,0.006),
    las=1,cex.axis=0.8,cex.lab=0.9,plot="infection.ratio",
    span=c(0.1,0.3),window=7)
abline(h=seq(0.1,0.6,by=0.1),lty="dotted",col="grey")
**Figure 1**: a) Observed U.S. daily COVID-19 deaths (red bars) and assumed IFR function (blue line). b) Ratio of confirmed COVID-19 cases / estimated infections.

Figure 1: a) Observed U.S. daily COVID-19 deaths (red bars) and assumed IFR function (blue line). b) Ratio of confirmed COVID-19 cases / estimated infections.

Figure 1 seems to show a ratio of confirmed / estimated infections of about 0.4 at the present; however, the reader should keep in mind that in practice this value is estimated separately for each jurisdiction that is being analyzed, and as such might be lower in some states and higher in others, even for a constant IFR value or function.

After fitting this sigmoidal curve to our observed and estimated cases through now - k days, we then turn to the last period. To obtain estimated infections for these days, we simply divide our observed cases from the last k days of data by our fitted values from the sigmoid curve.

Figure 2 shows the result of this analysis applied to data for the U.S. state of Massachusetts.

infection.estimator(data=Data,
    state="Massachusetts",
    ifr=c(0.015,0.01,0.007,0.006,0.006,0.006,0.006),
    span=c(0.1,0.3),window=1,
    las=1,cex.axis=0.8,cex.lab=0.9)
**Figure 2**: a) Observed daily COVID-19 deaths and an assumed model of IFR in which the infection fatality ratio is initially high (~1.5%), but then declines and stabilizes at around 0.6% through the present day. b) Estimated daily infections (green), cases (blue), and deaths (red).

Figure 2: a) Observed daily COVID-19 deaths and an assumed model of IFR in which the infection fatality ratio is initially high (~1.5%), but then declines and stabilizes at around 0.6% through the present day. b) Estimated daily infections (green), cases (blue), and deaths (red).

In addition to computing the raw number of daily infections, this method can also be used to compute infections as a percentage of the total population.

For this calculation, we obtained state populations through time from the U.S. Census Bureau.

Data is only given through 2019, so to estimate state-level 2020 population sizes, we used a total mid-year 2020 U.S. population estimate of 331,002,651) to ‘correct’ each 2019 state population size to a 2020 level.

Our dataset did not include a separate population size estimate for Puerto Rico - so we used the values from Wikipedia, and a similar correction to adjust to 2020 levels. (This probably resulted in a mid-2020 population size that is too high based on this alternative source; however we imagine that this is a relatively small effect.)

Finally, CDC mortality data splits New York City (NYC) from the rest of New York state. Since this contrast is interesting, we maintained this separation - and used a mid-2019 population estimate of (8,336,817) for NYC, then simply assumed that the population of NYC has changed between 2015 and 2020 in proportion to the rest of the state. (Since they have a part : whole relationship, this seemed pretty reasonable.)

States<-state.deaths(plot="States")
States<-States[sort(rownames(States)),]
States<-States[-which(rownames(States)=="United States"),]
knitr::kable(round(States),
    caption="**Table 1**: Estimated state population by year for 50 states, D.C., and Puerto Rico. New York is divided into two jurisdictions: New York City and New York, excluding NYC.",
    align="r", format = "html", table.attr = "style='width:75%;'",
    format.args=list(big.mark=","))
Table 1: Estimated state population by year for 50 states, D.C., and Puerto Rico. New York is divided into two jurisdictions: New York City and New York, excluding NYC.
2015 2016 2017 2018 2019 2020
Alabama 4,852,347 4,863,525 4,874,486 4,887,681 4,903,185 4,944,062
Alaska 737,498 741,456 739,700 735,139 731,545 737,644
Arizona 6,829,676 6,941,072 7,044,008 7,158,024 7,278,717 7,339,399
Arkansas 2,978,048 2,989,918 3,001,345 3,009,733 3,017,804 3,042,963
California 38,918,045 39,167,117 39,358,497 39,461,588 39,512,223 39,841,633
Colorado 5,450,623 5,539,215 5,611,885 5,691,287 5,758,736 5,806,746
Connecticut 3,587,122 3,578,141 3,573,297 3,571,520 3,565,287 3,595,010
Delaware 941,252 948,921 956,823 965,479 973,764 981,882
District of Columbia 675,400 685,815 694,906 701,547 705,749 711,633
Florida 20,209,042 20,613,477 20,963,613 21,244,317 21,477,737 21,656,795
Georgia 10,178,447 10,301,890 10,410,330 10,511,131 10,617,423 10,705,939
Hawaii 1,422,052 1,427,559 1,424,393 1,420,593 1,415,872 1,427,676
Idaho 1,651,059 1,682,380 1,717,715 1,750,536 1,787,065 1,801,964
Illinois 12,858,913 12,820,527 12,778,828 12,723,071 12,671,821 12,777,465
Indiana 6,608,422 6,634,304 6,658,078 6,695,497 6,732,219 6,788,345
Iowa 3,120,960 3,131,371 3,141,550 3,148,618 3,155,070 3,181,374
Kansas 2,909,011 2,910,844 2,908,718 2,911,359 2,913,314 2,937,602
Kentucky 4,425,976 4,438,182 4,452,268 4,461,153 4,467,673 4,504,920
Louisiana 4,664,628 4,678,135 4,670,560 4,659,690 4,648,794 4,687,551
Maine 1,328,262 1,331,317 1,334,612 1,339,057 1,344,212 1,355,419
Maryland 5,985,562 6,003,323 6,023,868 6,035,802 6,045,680 6,096,082
Massachusetts 6,794,228 6,823,608 6,859,789 6,882,635 6,892,503 6,949,965
Michigan 9,931,715 9,950,571 9,973,114 9,984,072 9,986,857 10,070,117
Minnesota 5,482,032 5,522,744 5,566,230 5,606,249 5,639,632 5,686,649
Mississippi 2,988,471 2,987,938 2,988,510 2,981,020 2,976,149 3,000,961
Missouri 6,071,732 6,087,135 6,106,670 6,121,623 6,137,428 6,188,595
Montana 1,030,475 1,040,859 1,052,482 1,060,665 1,068,778 1,077,688
Nebraska 1,891,277 1,905,616 1,915,947 1,925,614 1,934,408 1,950,535
Nevada 2,866,939 2,917,563 2,969,905 3,027,341 3,080,156 3,105,835
New Hampshire 1,336,350 1,342,307 1,348,787 1,353,465 1,359,711 1,371,047
New Jersey 8,867,949 8,870,827 8,885,525 8,886,025 8,882,190 8,956,240
New Mexico 2,089,291 2,091,630 2,091,784 2,092,741 2,096,829 2,114,310
New York 11,231,666 11,219,529 11,194,468 11,160,626 11,116,744 11,209,423
New York City 8,423,000 8,413,899 8,395,104 8,369,725 8,336,817 8,406,320
North Carolina 10,031,646 10,154,788 10,268,233 10,381,615 10,488,084 10,575,522
North Dakota 754,066 754,434 754,942 758,080 762,062 768,415
Ohio 11,617,527 11,634,370 11,659,650 11,676,341 11,689,100 11,786,551
Oklahoma 3,909,500 3,926,331 3,931,316 3,940,235 3,956,971 3,989,960
Oregon 4,015,792 4,089,976 4,143,625 4,181,886 4,217,737 4,252,900
Pennsylvania 12,784,826 12,782,275 12,787,641 12,800,922 12,801,989 12,908,718
Puerto Rico 3,473,232 3,406,672 3,325,286 3,193,354 3,193,694 3,220,320
Rhode Island 1,056,065 1,056,770 1,055,673 1,058,287 1,059,361 1,068,193
South Carolina 4,891,938 4,957,968 5,021,268 5,084,156 5,148,714 5,191,638
South Dakota 853,988 862,996 872,868 878,698 884,659 892,034
Tennessee 6,591,170 6,646,010 6,708,799 6,771,631 6,829,174 6,886,108
Texas 27,470,056 27,914,410 28,295,273 28,628,666 28,995,881 29,237,617
Utah 2,981,835 3,041,868 3,101,042 3,153,550 3,205,958 3,232,686
Vermont 625,216 623,657 624,344 624,358 623,989 629,191
Virginia 8,361,808 8,410,106 8,463,587 8,501,286 8,535,519 8,606,679
Washington 7,163,657 7,294,771 7,423,362 7,523,869 7,614,893 7,678,378
West Virginia 1,842,050 1,831,023 1,817,004 1,804,291 1,792,147 1,807,088
Wisconsin 5,760,940 5,772,628 5,790,186 5,807,406 5,822,434 5,870,975
Wyoming 585,613 584,215 578,931 577,601 578,759 583,584


The question of cumulative percent infected is relevant to the (unnecessarily controversial) concept of ‘herd immunity’. The following graph (Figure 3) shows estimated infections as a fraction of the total population for the U.S. state of Colorado, using the same IFR model as in Figures 1 and 2.

infection.estimator(data=Data,
    state="Colorado",
    ifr=c(0.015,0.01,0.007,0.006,0.006,0.006,0.006),
    las=1,cex.axis=0.8,cex.lab=0.9,
    span=c(0.1,0.3),window=1,
    cumulative=TRUE,percent=TRUE)
**Figure 3**: a) Observed daily COVID-19 deaths and an assumed model of IFR. b) Estimated cumulative infections (green), cases (blue), and deaths (red), as a percentage of the total population of the state.

Figure 3: a) Observed daily COVID-19 deaths and an assumed model of IFR. b) Estimated cumulative infections (green), cases (blue), and deaths (red), as a percentage of the total population of the state.

Though plot suggests that perhaps around 15-20% of the population in Colorado has already been infected, users should keep in mind that this is entirely dependent on how we decided to model IFR through time. To see this, let’s take the same state (Colorado) and imagine (perhaps relatively naively) a constant IFR of 0.5%, rather than one that declines through time.

infection.estimator(data=Data,
    state="Colorado",
    ifr=0.005,
    las=1,cex.axis=0.8,cex.lab=0.9,
    span=c(0.1,0.3),window=1,
    cumulative=TRUE,percent=TRUE)
**Figure 4**: The same analysis as Figure 3, but assuming a constant IFR of 0.5%.

Figure 4: The same analysis as Figure 3, but assuming a constant IFR of 0.5%.

We can see from Figure 4 that for a lower assumed IFR, a larger number of daily & cumulative infections are implied.

Assumptions of the method

This model is very simple.

We merely assume that if we knew the true number of infections and the IFR for our population of interest on day i, then we could predict the number of deaths on day i + k, in which k is the lag-time from infection to death (for SARS-CoV-2 infections leading to death). Having observed the deaths, & supposing a particular value of IFR for day i, we can likewise work backwards and reconstruct the most plausible number of infections for that day.

Although the model does not pre-suppose a specific value or function for IFR, it does require that one be specified by the user. As such, it is probably worth mentioning the effect of setting an IFR value that is too high or too low compared to the true IFR for the population of interest.

An IFR that is too high (overall or at a specific time during the pandemic) will have the general effect of causing us to systematically underestimate the true number of infections. This makes sense because if we imagine observing 50 COVID-19 deaths, an IFR of 0.5% would imply that these deaths correspond to a total of 10,000 infections. By contrast, a higher IFR of, say, 1.0% would only imply that only 5,000 infections had occurred. Assuming an IFR value that is too low will (obviously) have exactly the opposite effect and thus cause us to overestimate the number of infections that have occurred. (This should help readers understand the difference between Figures 3 & 4, above.)

We do assume a homogeneous value of k at any particular time. In fact, literature sources report lag-times between two and eight weeks; however, we assert that inferences by our method should not be badly off - so long as IFR does not swing about wildly from day to day, and so long as the number of deaths is not extremely few for any reporting period.

We likewise do assume a constant lag-period, k, through time. This assumption is perhaps a bit more dubious as it is quite reasonable to suspect that for a specific state or jurisdiction that as IFR falls, k might also increase. This is a complexity that we explicit chose to ignore in our model.

We also assume that a more-or-less consistent fraction of COVID-19 deaths are reported as such - that is, that COVID-19 is neither systematically under- or overreported as the cause of death at any point during the course of the pandemic. A violation of this assumption is not quite as grave as it might seem, however, because it can simply be ‘baked in’ to our model for IFR. For instance, if we imagine that COVID-19 deaths were under-reported near the start of the pandemic (due to limited testing capacity), this can be accommodated into our model for daily infections simply by specifying a slightly lower IFR value at that time (keeping in mind, of course, that the true IFR has generally been falling through time).

In estimating the number of daily infections from now - k days to the present, we assume that the relationship between time (since the first infections) and the ratio of confirmed and estimated infections is sigmoidal in shape (Figure 1). This is a testable assumption that seems to hold fairly well across the entire U.S. (Figure 1) and for some jurisdictions, but less well for others. It’s equally plausible to suspect that this ratio could shift not only as a function of time, but also as demands on testing capacity rise and fall with case numbers. This should be the subject of additional study, but our suspicion is that this would not be likely to have a large effect on our model compared to other simplifications.

Finally we assume no or limited reporting delay. This is obviously incorrect. There are two main sources of reporting delay: the delay between when an individual is infected and when the test positive for SARS-CoV-2; and the delay between when an infected patient dies and their death is reported to the CDC. Given this delay in reporting, a more precise interpretation of the estimated number of daily infections, is a (rough) estimate of the number of new individuals who would test positive for SARS-CoV-2 under a hypothetical scenario of universal testing.

Showing observed and estimated unobserved infections using an ‘iceberg plot’

As noted above, it has long been well-understood that the number of daily confirmed COVID-19 cases is an underestimate of the true number of daily infections, sometimes by a very wide margin.

To visualize this phenomenon, we devised an iceberg plot in which we simultaneously graph the number of observed COVID-19 infections (above the ‘waterline’ of the graph) and the estimated number of unobserved infections (below it).

Figure 5 gives this analysis for the U.S. state of New York, in which we assumed the same IFR model through time as was used for Figures 1, 2, and 3. Daily observed cases are smoothed using a 3-day moving average.

iceberg.plot(data=Data,
    state="New York",
    ifr=c(0.015,0.01,0.007,0.006,0.006,0.006,0.006),
    las=1,cex.axis=0.8,cex.lab=0.9,span=c(0.1,0.3),
    window=3)
**Figure 5**: Iceberg plot showing the confirmed daily new infections (above the waterline) and estimated unobserved infections (below it) for the U.S. state of Washington.

Figure 5: Iceberg plot showing the confirmed daily new infections (above the waterline) and estimated unobserved infections (below it) for the U.S. state of Washington.

Assumptions of the method

This method has exactly the same assumptions as we used in estimating infections, above.

Mapping the distribution of infections across states

A hallmark feature of the U.S. COVID-19 pandemic has been the shifting geographic distribution of infections through time among states.

To capture this dynamic, we devised a plotting method in which we overlay the daily or cumulative COVID-19 infections under our model (outlined above), separated by state.

We specifically selected a geographic color palette for this analysis such that RGB color values were made to vary as a function of latitude, longitude, and (arbitrarily) geographic distance from Florida. This is intended to have the effect of making the regional geographic progression of infection more apparent in the graph. The results can be seen in Figure 6.

infections.by.state(data=Data,
    ifr=c(0.015,0.01,0.007,0.006,0.006,0.006,0.006),
    las=1,cex.axis=0.8,cex.lab=0.9,span=0.1)
**Figure 6**: Daily estimated infections separated by state.

Figure 6: Daily estimated infections separated by state.

Assumptions of the method

This plotting method shares all the assumptions of our infection estimator, above, but adds the additional assumption that our model of IFR is the same for all states. This assumption is quite dubious, in fact, as IFR could be expected to rise in locations were hospital resources are overtaxed by high disease burden; and, conversely, fall in hospitals were staff have more experience in treating COVID-19 patients. Although we don’t doubt that these nuances are important in making specific, quantitative statements about the particular number of infections in each state, we nonetheless believe that our method is effective at visually capturing the overall geographic dynamics of the COVID-19 pandemic in the United States.

Computing a plausible range of infection numbers

A relatively simple extension of our infection estimation method, above, is to admit uncertainty about the specific value of the infection-fatality-ratio at any particular time during the pandemic, and measure the sensitivity of our prediction to a range of different values of IFR. This is valuable, because the question of the IFR for COVID-19 has been the subject of considerable controversy!

This model can be design to accomodate an assumption of broad uncertainty in IFR early during the pandemic, with both decreasing IFR, as well as falling uncertainty in IFR, towards the present. This is illustrated for data from the U.S. state of Louisiana in Figure 7.

infection.range.estimator(state="Louisiana",
    data=Data,
    ifr.low=c(0.008,0.006,0.005,0.004,0.004,0.004),
    ifr.high=c(0.022,0.014,0.009,0.007,0.007,0.007),
    span=0.1)
**Figure 7**: a) Confirmed COVID-19 deaths. b) A corresponding plausible range of daily COVID-19 infections, under our model.

Figure 7: a) Confirmed COVID-19 deaths. b) A corresponding plausible range of daily COVID-19 infections, under our model.

Assumptions of this method

This plotting method shares all the assumptions of our infection estimator.

It should be noted that although the shaded region around the mean number of daily or cumulative infections in Figure 7 looks like a confidence band, it would only be valid to consider it as such if our high and low values of the IFR through time represent a confidence interval around the true infection-fatality-ratio (and, even then, this confidence interval would only take into account one source of uncertainty about the true daily case numbers - the IFR).

Comparing daily and cumulative infections between states

Another straightforward extension of our above-described model involves directly comparing daily (or cumulative) infections between states.

This could be a useful exercise because it is quite common for (particularly) popular press sources to attribute difference infection dynamics in different states to one public health intervention or another. This attribution may be valid, but is often confounded by different infection dynamics through time in the different states being compared.

The following graph (Figure 8) shows a comparison of cumulative deaths and estimated infections in California and Florida.

par(lwd=2)
compare.infections(data=Data,
    state=c("California","Florida"),
    ifr=c(0.015,0.01,0.007,0.006,0.006,0.006,0.006),
    las=1,cex.axis=0.8,cex.lab=0.9,cumulative=TRUE,
    span=c(0.1,0.3))
**Figure 8**: Daily confirmed COVID-19 deaths (a) or estimated infections (b) in the U.S. states of California vs. Florida.

Figure 8: Daily confirmed COVID-19 deaths (a) or estimated infections (b) in the U.S. states of California vs. Florida.

Assumptions of this method

This plotting method shares all the assumptions of our infection estimator, and (just like our method for visualizing the geographic dynamics of the pandemic across all U.S. states) requires that we use the same IFR model for each state.

Since the daily and cumulative number of infections scales with population size, valid state-to-state comparisons really only make sense if done on a per-capita basis (e.g., infections / 1M population), as shown in Figure 8.

Visualizing the number of COVID-19 deaths by age

In addition to modeling the number of infections through time, the covid19.Explorer R package and website also allow users to visualize the distribution of COVID-19 deaths by age & sex (Figure 9).

covid.deaths(data=Data,
    plot="bar",
    las=1,cex.axis=0.8,cex.lab=0.9)
**Figure 9**: a) Weekly confirmed COVID-19 deaths subdivided by age. b) Weekly COVID deaths (red) and deaths by other causes (blue) for all ages.

Figure 9: a) Weekly confirmed COVID-19 deaths subdivided by age. b) Weekly COVID deaths (red) and deaths by other causes (blue) for all ages.

As well as graphing the raw number of deaths, this function can also be used to show the deaths / 1M population. To do this, we used estimated U.S. populations by age and sex from the United States Census Bureau (Table 2).

Ages<-Data$Age.Pop
nn<-Ages$Age.Group[Ages$Sex=="Female"]
Ages<-data.frame(Female=Ages$Total.Population[Ages$Sex=="Female"],
    Male=Ages$Total.Population[Ages$Sex=="Male"],
    Total=Ages$Total.Population[Ages$Sex=="Female"]+
    Ages$Total.Population[Ages$Sex=="Male"])
rownames(Ages)<-nn
knitr::kable(Ages,
    caption="**Table 2**: Estimated U.S. population by age group and sex, 2020.",
    align="r", format = "html", table.attr = "style='width:50%;'",
    format.args=list(big.mark=","))
Table 2: Estimated U.S. population by age group and sex, 2020.
Female Male Total
Under 1 year 2,016,583 2,112,227 4,128,810
1-4 years 8,030,910 8,407,948 16,438,858
5-14 years 20,064,260 20,944,619 41,008,879
15-24 years 21,057,902 22,048,975 43,106,877
25-34 years 22,947,659 23,942,277 46,889,936
35-44 years 21,256,845 21,370,925 42,627,770
45-54 years 20,650,440 20,191,496 40,841,936
55-64 years 22,221,321 20,798,044 43,019,365
65-74 years 17,608,023 15,467,151 33,075,174
75-84 years 9,318,438 7,320,885 16,639,323
85 years and over 4,294,658 2,431,872 6,726,530


This permits us to (for example) demonstrate the COVID-19 is not an important source of death for U.S. people under the age of 15, accounting for only about 0.5% of all deaths (Figure 10).

covid.deaths(age.group=c("Under 1 year","1-4 years","5-14 years"),
    data=Data,
    plot="smooth",
    show="per.capita",
    cumulative=TRUE,
    split.groups=FALSE,
    las=1,cex.axis=0.8,cex.lab=0.9)
**Figure 10**: a) Cumulative confirmed COVID-19 deaths for children <15. b) Cumulative COVID deaths (red) and deaths by other causes (blue) for all ages.

Figure 10: a) Cumulative confirmed COVID-19 deaths for children <15. b) Cumulative COVID deaths (red) and deaths by other causes (blue) for all ages.

By contrast, COVID-19 accounts for nearly 15% of all deaths of U.S. persons aged ≥75 years of age (Figure 11).

covid.deaths(age.group=c("75-84 years", "85 years and over"),
    data=Data,
    plot="smooth",
    show="percent",
    cumulative=TRUE,
    split.groups=FALSE,
    las=1,cex.axis=0.8,cex.lab=0.9)
**Figure 11**: a) Cumulative confirmed COVID-19 deaths for U.S. adults 75 years of age and older. b) Cumulative COVID deaths (red) and deaths by other causes (blue) for all ages.

Figure 11: a) Cumulative confirmed COVID-19 deaths for U.S. adults 75 years of age and older. b) Cumulative COVID deaths (red) and deaths by other causes (blue) for all ages.

Note the difference between the axes of Figure 10 and Figure 11. In the former deaths are shown per 1M population, whereas the latter reports COVID-19 deaths as a fraction of all deaths. Both types of visualization can be used for data subdivided by age group.

Excess mortality analysis

Lastly, the the covid19.Explorer R package and website allows users to explore excess mortality (also called mortality displacement) by age and state, regardless of cause.

For this analysis, we used the 2014-2018 weekly counts of deaths by state and select causes and the 2019-2020 provisional death counts from the National Center for Health Statistics (NCHS) of the United States Centers for Disease Control and Prevention.

Provisional death counts are incomplete, particularly for recent weeks, due to the lag time between when a death occurs and when a death certificate is submitted to the NCHS. This accounts for the drop-off in excess deaths towards the right of each plot in 2020 data. This lag may also differ between different jurisdictions.

To compute the raw death counts for any jurisdiction, we simply tabulated the 2015-2018 counts (we excluded 2014) with the 2019-2020 provisional counts. The following table gives an example of tabulated death counts for a the state of Massachusetts.

Deaths<-state.deaths(plot="Deaths")
Deaths<-cbind(Week=1:52,Deaths)
knitr::kable(Deaths,
    caption="**Table 3**: Weekly death counts and provisional death counts in Massachusetts according to the NCHS from 2015 through 2020.",
    align="r",format = "html", table.attr = "style='width:50%;'",
    format.args=list(big.mark=","))
Table 3: Weekly death counts and provisional death counts in Massachusetts according to the NCHS from 2015 through 2020.
Week 2015 2016 2017 2018 2019 2020
1 1,303 1,123 1,247 1,373 1,207 1,235
2 1,335 1,131 1,310 1,455 1,265 1,266
3 1,349 1,138 1,282 1,311 1,291 1,190
4 1,404 1,146 1,239 1,252 1,273 1,302
5 1,401 1,046 1,294 1,274 1,268 1,204
6 1,353 960 1,262 1,287 1,246 1,233
7 1,300 953 1,378 1,273 1,270 1,251
8 1,226 1,023 1,302 1,208 1,203 1,166
9 1,241 1,135 1,244 1,228 1,277 1,278
10 1,251 1,145 1,254 1,234 1,198 1,214
11 1,204 1,168 1,184 1,144 1,151 1,311
12 1,212 1,161 1,141 1,188 1,112 1,170
13 1,219 1,236 1,220 1,189 1,225 1,289
14 1,190 1,173 1,219 1,180 1,213 1,549
15 1,189 1,150 1,230 1,149 1,167 2,038
16 1,157 1,107 1,090 1,141 1,218 2,492
17 1,155 1,108 1,142 1,105 1,087 2,533
18 1,187 1,094 1,157 1,160 1,104 2,308
19 1,100 1,098 1,040 1,133 1,081 2,071
20 1,046 1,023 1,107 1,114 1,045 1,885
21 1,082 1,121 1,038 1,102 1,068 1,578
22 1,003 1,064 1,087 1,140 1,095 1,358
23 1,088 1,057 1,025 1,110 1,099 1,235
24 1,077 1,082 1,078 1,098 1,045 1,139
25 1,018 1,062 1,092 1,048 1,062 1,126
26 1,099 993 1,088 1,042 1,080 1,082
27 1,064 1,001 1,094 1,146 1,107 1,046
28 984 987 1,110 1,083 1,067 1,046
29 1,050 1,013 1,030 1,017 1,038 1,060
30 1,034 1,021 1,038 1,052 1,103 1,088
31 978 1,059 1,104 1,082 1,038 1,021
32 1,001 1,051 1,083 1,050 1,090 994
33 1,006 1,075 1,105 1,042 1,032 1,049
34 1,032 1,023 1,052 1,015 1,061 1,032
35 1,001 1,022 1,048 1,039 1,075 1,058
36 1,028 1,060 1,065 1,061 1,033 1,015
37 1,013 1,036 1,064 1,099 1,070 1,012
38 991 1,022 977 1,118 1,083 1,094
39 1,068 1,081 1,041 1,045 1,114 1,123
40 1,132 1,050 1,130 1,127 1,122 1,102
41 1,124 1,105 1,061 1,092 1,096 1,095
42 1,082 1,144 1,113 1,188 1,128 1,129
43 1,122 1,052 1,091 1,137 1,162 1,188
44 1,134 1,147 1,160 1,166 1,108 1,087
45 1,092 1,160 1,094 1,107 1,117 1,213
46 1,090 1,199 1,208 1,144 1,130 1,224
47 1,102 1,132 1,157 1,141 1,231 1,203
48 1,003 1,144 1,177 1,109 1,204 1,239
49 1,058 1,125 1,131 1,278 1,284 1,292
50 1,081 1,106 1,148 1,147 1,179 1,257
51 1,047 1,178 1,265 1,277 1,148 1,046
52 1,041 1,239 1,245 1,214 1,205 928


To correct observed deaths in prior years to 2020 levels, we multiplied the past-year death tally by the ratio the jurisdiction population in 2020 compared to the population in the past year.

Table 4 gives an example using the jurisdiction of New Jersey.

Deaths<-state.deaths(state="New Jersey",plot="Deaths",corrected=TRUE)
Deaths<-cbind(Week=1:52,Deaths)
knitr::kable(round(Deaths,1),
    caption="**Table 4**: Weekly death counts and provisional death counts in New Jersey, corrected to New Jersey 2020 estimated population.",
    align="r",format = "html", table.attr = "style='width:50%;'",
    format.args=list(big.mark=","))
Table 4: Weekly death counts and provisional death counts in New Jersey, corrected to New Jersey 2020 estimated population.
Week 2015 2016 2017 2018 2019 2020
1 1,612.9 1,446.8 1,536.1 1,695.3 1,539.7 1,628
2 1,705.8 1,511.4 1,711.5 1,780.0 1,503.4 1,587
3 1,693.7 1,435.7 1,618.8 1,617.7 1,521.6 1,489
4 1,664.4 1,479.1 1,588.5 1,647.9 1,557.9 1,579
5 1,595.7 1,492.2 1,645.0 1,589.5 1,586.1 1,517
6 1,590.7 1,431.7 1,574.4 1,679.2 1,648.6 1,494
7 1,563.4 1,421.6 1,594.6 1,644.9 1,564.9 1,488
8 1,578.6 1,404.4 1,593.6 1,539.1 1,520.6 1,493
9 1,477.6 1,490.2 1,499.8 1,492.7 1,580.1 1,571
10 1,515.9 1,467.0 1,594.6 1,478.6 1,539.7 1,507
11 1,481.6 1,487.2 1,520.0 1,432.2 1,469.1 1,477
12 1,411.9 1,540.7 1,456.5 1,528.0 1,524.6 1,597
13 1,402.8 1,360.0 1,451.5 1,521.9 1,455.0 2,124
14 1,453.3 1,371.1 1,517.0 1,474.6 1,433.9 3,481
15 1,398.8 1,418.5 1,435.3 1,454.4 1,414.7 4,769
16 1,290.7 1,442.8 1,431.3 1,453.4 1,403.6 4,729
17 1,352.3 1,385.2 1,420.2 1,379.8 1,323.9 3,976
18 1,349.3 1,307.5 1,349.7 1,478.6 1,394.5 3,180
19 1,331.1 1,352.9 1,412.1 1,321.4 1,354.2 2,600
20 1,273.6 1,347.9 1,419.2 1,270.0 1,369.3 2,285
21 1,294.8 1,320.6 1,363.8 1,349.6 1,398.6 1,937
22 1,293.8 1,309.5 1,370.8 1,279.0 1,422.8 1,745
23 1,330.1 1,280.2 1,348.6 1,372.8 1,419.7 1,583
24 1,286.7 1,391.3 1,319.4 1,372.8 1,349.2 1,485
25 1,292.7 1,278.2 1,351.7 1,378.8 1,304.8 1,410
26 1,342.2 1,322.6 1,333.5 1,343.5 1,361.3 1,444
27 1,295.8 1,338.8 1,333.5 1,411.1 1,405.6 1,405
28 1,289.7 1,282.2 1,311.4 1,336.5 1,406.6 1,386
29 1,258.4 1,263.0 1,351.7 1,391.9 1,349.2 1,322
30 1,264.5 1,320.6 1,266.0 1,373.8 1,369.3 1,368
31 1,315.0 1,290.3 1,327.5 1,386.9 1,303.8 1,388
32 1,282.6 1,333.7 1,347.6 1,365.7 1,345.1 1,341
33 1,216.0 1,345.8 1,268.0 1,341.5 1,321.9 1,306
34 1,268.5 1,362.0 1,340.6 1,362.7 1,397.6 1,338
35 1,295.8 1,323.6 1,428.3 1,419.1 1,283.6 1,335
36 1,238.2 1,290.3 1,346.6 1,357.6 1,340.1 1,395
37 1,271.5 1,342.8 1,335.5 1,341.5 1,304.8 1,386
38 1,307.9 1,270.1 1,370.8 1,347.6 1,363.3 1,372
39 1,257.4 1,315.5 1,337.6 1,292.1 1,327.0 1,388
40 1,402.8 1,380.2 1,312.4 1,420.1 1,367.3 1,328
41 1,350.3 1,367.0 1,310.3 1,385.9 1,420.7 1,396
42 1,356.4 1,454.9 1,279.1 1,394.9 1,417.7 1,335
43 1,306.9 1,418.5 1,349.7 1,454.4 1,467.1 1,466
44 1,363.4 1,382.2 1,356.7 1,492.7 1,335.0 1,427
45 1,335.2 1,404.4 1,352.7 1,384.9 1,477.2 1,558
46 1,290.7 1,499.3 1,437.3 1,475.6 1,485.3 1,619
47 1,336.2 1,415.5 1,501.9 1,481.6 1,509.5 1,660
48 1,327.1 1,434.7 1,406.1 1,487.7 1,441.9 1,679
49 1,356.4 1,459.9 1,542.2 1,509.8 1,552.8 1,859
50 1,364.5 1,456.9 1,554.3 1,482.6 1,599.2 1,883
51 1,336.2 1,508.4 1,527.1 1,464.5 1,472.2 1,472
52 1,338.2 1,595.2 1,563.3 1,500.8 1,585.1 1,178


To compute excess deaths for any jurisdiction, we took each column of Table 3 (or Table 4, for corrected death counts) and subtracted the mean of columns 2015 through 2019. This treats 2015 through 2019 as ‘normal’ years, and 2020 as unusual. Table 5 gives an example of this calculation using the jurisdiction of Texas, and weekly death counts from 2015-2019 corrected to 2020 population.

Excess<-state.deaths(state="Texas",plot="Excess",corrected=TRUE)
Excess<-cbind(Week=1:52,Excess)
knitr::kable(round(Excess,1),
    caption="**Table 5**: Weekly excess death counts and provisional death counts in Texas, compared to 2015-2019 corrected death counts.",
    align="r",format = "html", table.attr = "style='width:50%;'",
    format.args=list(big.mark=","))
Table 5: Weekly excess death counts and provisional death counts in Texas, compared to 2015-2019 corrected death counts.
Week 2015 2016 2017 2018 2019 2020
1 206.9 -270.8 -370.5 645.2 -210.9 -204.6
2 189.5 -527.9 -109.7 666.9 -218.8 22.5
3 145.0 -399.0 -214.3 676.3 -208.0 -164.8
4 20.3 -271.6 -95.4 510.3 -163.7 -172.9
5 -26.3 -221.9 -64.8 370.0 -57.0 -116.0
6 -87.4 -63.3 28.5 252.8 -130.5 22.0
7 -130.9 -77.0 1.2 216.3 -9.6 97.3
8 15.9 -153.6 110.4 38.9 -11.6 32.9
9 60.3 -44.7 52.7 63.8 -132.0 -24.4
10 -18.8 -179.8 85.3 14.8 98.4 -40.2
11 95.0 -146.7 -17.0 -63.6 132.3 66.3
12 -42.3 -30.8 9.0 -100.7 164.8 193.9
13 -32.6 49.1 101.2 -129.2 11.5 272.5
14 -2.9 94.9 -32.6 -76.9 17.5 260.0
15 32.8 -53.4 -61.3 14.1 67.9 326.7
16 -28.4 -16.4 -55.9 22.1 78.6 413.9
17 10.2 4.8 51.6 -47.3 -19.3 576.7
18 68.4 -54.3 -27.3 25.8 -12.6 378.3
19 -18.5 14.5 36.4 -9.3 -23.1 473.7
20 -61.9 -3.3 63.1 11.7 -9.6 459.9
21 33.9 77.7 -44.8 -106.8 40.0 457.3
22 56.5 -123.1 -20.6 26.4 60.7 360.9
23 -35.4 40.2 -122.6 -17.8 135.5 389.6
24 -78.3 113.2 35.0 -34.9 -34.9 618.1
25 -109.5 34.5 73.2 -69.5 71.3 736.5
26 -53.8 4.7 -17.9 33.7 33.3 818.0
27 -14.3 28.9 61.7 -203.3 126.9 1,282.9
28 87.7 -28.4 -210.4 43.9 107.1 2,038.5
29 26.2 -10.2 -84.9 62.9 6.0 2,544.2
30 -85.3 86.7 24.7 -1.9 -24.2 2,541.9
31 -7.2 -85.2 10.5 0.9 80.9 2,148.9
32 -96.9 -4.8 10.2 -12.9 104.4 1,970.8
33 -80.8 27.8 -87.8 85.4 55.4 2,047.1
34 -142.6 8.3 22.7 126.8 -15.2 1,262.8
35 -75.1 93.2 30.5 -64.2 15.6 1,400.7
36 -139.5 66.4 69.1 -89.4 93.5 1,199.9
37 -82.6 29.7 60.4 -30.0 22.6 880.4
38 -5.4 -6.6 49.2 56.6 -93.8 882.5
39 -32.4 -121.2 36.1 -12.6 130.1 813.4
40 -62.3 -46.9 63.2 -9.7 55.7 726.6
41 -166.5 -7.8 132.0 -43.7 85.9 854.6
42 -74.3 -96.7 89.3 -51.5 133.2 632.3
43 -199.6 19.5 121.0 -23.7 82.8 846.1
44 -130.9 -101.6 198.9 -81.2 114.8 1,087.2
45 -159.9 -56.8 44.8 31.0 140.9 1,295.1
46 -3.1 -103.4 -44.3 78.1 72.8 1,225.1
47 -188.0 -116.1 1.8 68.2 234.2 1,117.9
48 -167.3 -42.8 58.2 48.0 103.9 1,044.5
49 127.6 -184.7 31.3 0.6 25.3 771.2
50 -95.2 -83.9 55.0 -18.9 143.0 127.2
51 -187.0 -97.5 403.0 -73.4 -45.1 -1,257.9
52 -298.7 -71.7 291.3 -27.3 106.5 -2,327.7


To compute the weekly death rate per one-million population, we now divided the total number of deaths in Table 3 by the total population (in millions) of each jurisdiction. Note, that since we used different population sizes for each year of data (Table 2) the notion of ‘correcting’ to 2020 population size is meaningless here.

Table 6 gives the weekly per 1M population death rate for the District of Columbia.

PerCapita<-state.deaths(state="District of Columbia",plot="PerCapita")
PerCapita<-cbind(Week=1:52,PerCapita)
knitr::kable(round(PerCapita,1),
    caption="**Table 6**: Weekly deaths and provisional deaths per 1M population in the District of Columbia.",
    align="r",format = "html", table.attr = "style='width:50%;'",
    format.args=list(big.mark=","))
Table 6: Weekly deaths and provisional deaths per 1M population in the District of Columbia.
Week 2015 2016 2017 2018 2019 2020
1 204.3 144.4 191.4 220.9 184.2 185.5
2 214.7 172.1 182.8 223.8 172.9 179.9
3 152.5 161.9 179.9 172.5 148.8 196.7
4 198.4 161.9 171.2 193.9 164.4 184.1
5 176.2 177.9 184.2 198.1 175.7 200.9
6 180.6 170.6 177.0 206.7 170.0 184.1
7 167.3 183.7 211.5 185.3 161.5 171.4
8 177.7 135.6 179.9 135.4 160.1 146.1
9 146.6 182.3 191.4 181.0 150.2 189.7
10 191.0 182.3 195.7 159.6 171.4 165.8
11 176.2 151.6 165.5 168.2 171.4 160.2
12 182.1 176.4 197.1 159.6 147.4 181.3
13 168.8 183.7 185.6 192.4 143.1 182.7
14 168.8 179.3 171.2 171.1 157.3 219.2
15 171.8 160.4 177.0 163.9 147.4 271.2
16 151.0 189.6 192.8 155.4 160.1 283.9
17 164.3 177.9 192.8 176.8 177.1 340.1
18 148.1 167.7 161.2 192.4 162.9 302.1
19 133.3 144.4 168.4 152.5 151.6 300.7
20 128.8 153.1 182.8 148.2 124.7 260.0
21 149.5 153.1 142.5 166.8 155.9 254.3
22 134.7 145.8 187.1 151.1 150.2 237.5
23 136.2 182.3 165.5 161.1 151.6 209.4
24 167.3 137.1 151.1 162.5 147.4 206.6
25 154.0 176.4 151.1 149.7 165.8 185.5
26 164.3 144.4 184.2 172.5 128.9 181.3
27 165.8 166.2 159.7 145.4 140.3 171.4
28 137.7 144.4 172.7 163.9 151.6 193.9
29 154.0 167.7 156.9 178.2 175.7 170.0
30 149.5 172.1 132.4 145.4 181.4 172.8
31 142.1 123.9 159.7 188.2 130.4 182.7
32 171.8 185.2 162.6 155.4 154.4 164.4
33 180.6 175.0 158.3 162.5 172.9 164.4
34 154.0 161.9 146.8 158.2 164.4 158.8
35 148.1 164.8 169.8 135.4 167.2 195.3
36 133.3 158.9 164.1 148.2 171.4 196.7
37 155.5 147.3 162.6 165.3 157.3 175.7
38 148.1 192.5 168.4 159.6 170.0 149.0
39 148.1 185.2 174.1 181.0 164.4 156.0
40 154.0 183.7 165.5 149.7 136.0 191.1
41 167.3 156.0 184.2 169.6 180.0 198.1
42 191.0 163.3 171.2 181.0 182.8 163.0
43 159.9 166.2 145.3 198.1 184.2 147.5
44 167.3 166.2 141.0 169.6 195.5 174.2
45 152.5 167.7 177.0 178.2 198.4 184.1
46 155.5 195.4 161.2 153.9 148.8 175.7
47 173.2 179.3 164.1 171.1 172.9 179.9
48 176.2 201.2 166.9 153.9 168.6 154.6
49 149.5 214.3 185.6 151.1 192.7 185.5
50 168.8 177.9 177.0 175.3 174.3 175.7
51 155.5 210.0 192.8 196.7 174.3 74.5
52 133.3 166.2 181.3 146.8 221.0 35.1


To compute the weekly excess per capita deaths (once again, per 1M population), we took the excess deaths and divided by the population size of the corresponding jurisdiction - in millions. The cumulative excess per capita deaths is one way of measuring the death toll of COVID-19.

Table 7 gives the weekly per 1M population excess death rate for New York City.

PerCapitaExcess<-state.deaths(state="New York City",plot="PerCapitaExcess")
PerCapitaExcess<-cbind(Week=1:52,PerCapitaExcess)
knitr::kable(round(PerCapitaExcess,1),
    caption="**Table 7**: Weekly excess deaths and provisional deaths per 1M population in New York City.",
    align="r",format = "html", table.attr = "style='width:50%;'",
    format.args=list(big.mark=","))
Table 7: Weekly excess deaths and provisional deaths per 1M population in New York City.
Week 2015 2016 2017 2018 2019 2020
1 5.5 -9.8 -4.6 15.0 -6.0 -9.0
2 -1.4 -7.7 0.5 17.5 -8.9 -6.3
3 9.7 -6.9 -3.2 0.7 -0.3 -4.9
4 1.2 -10.1 1.7 10.5 -3.3 -1.5
5 8.9 -8.7 -0.5 3.2 -2.8 2.6
6 0.1 -5.3 -3.7 3.8 5.2 -2.3
7 8.7 -9.4 -3.0 7.2 -3.4 -1.4
8 8.7 4.3 -5.9 -3.1 -3.9 -2.3
9 4.1 6.0 -7.6 0.6 -3.0 -1.4
10 -6.5 5.4 -4.4 4.7 0.8 1.0
11 -0.3 2.1 1.3 -3.0 -0.1 4.3
12 -0.9 3.2 1.7 -2.4 -1.6 41.3
13 3.1 -0.2 -6.6 -3.7 7.5 209.5
14 6.4 -2.2 -6.7 2.6 -0.1 624.1
15 -5.4 3.8 -1.5 3.6 -0.5 808.1
16 -3.7 4.3 1.9 4.7 -7.2 580.0
17 6.8 -2.6 3.5 -2.6 -5.0 362.1
18 -2.7 -4.5 2.2 2.1 2.8 220.5
19 -0.4 3.3 -0.4 -7.8 5.3 127.8
20 -9.4 -2.8 9.9 -4.6 6.9 71.8
21 2.2 -2.9 -2.1 -1.7 4.5 34.5
22 5.3 -3.7 2.6 3.3 -7.4 24.8
23 3.6 -5.6 -5.2 -2.2 9.4 17.2
24 -6.3 0.1 2.7 2.2 1.3 14.7
25 -3.8 -1.1 -2.4 5.7 1.6 -1.5
26 -1.4 -1.8 -2.4 1.9 3.7 12.8
27 -3.0 -2.7 1.4 5.3 -1.0 -0.7
28 -2.3 -1.4 -3.9 6.9 0.7 7.8
29 -1.8 0.0 -0.7 -5.1 7.6 -0.8
30 -0.4 3.2 -2.8 -6.5 6.5 6.0
31 -0.6 2.3 -2.9 -5.0 6.2 -1.4
32 -2.1 -1.8 0.9 3.1 -0.1 -0.4
33 2.3 3.8 -0.1 -1.9 -4.0 -2.0
34 -1.8 -1.6 1.7 1.7 0.1 0.0
35 -0.2 -1.0 1.9 1.8 -2.5 3.9
36 -1.5 1.7 -0.1 4.0 -4.1 -1.0
37 -2.4 -1.4 4.3 -1.2 0.8 -1.3
38 -4.3 -5.8 0.6 2.9 6.5 3.2
39 -7.2 -1.3 6.7 0.1 1.8 0.4
40 2.2 -1.1 -3.7 0.6 1.9 -8.9
41 -3.8 2.0 -7.3 1.5 7.6 -3.2
42 5.7 -1.9 -4.1 0.1 0.1 -2.5
43 -5.0 -7.0 4.0 6.3 1.8 -0.5
44 -0.5 1.5 -0.5 7.6 -8.1 2.2
45 -16.9 2.3 1.8 2.7 10.1 13.5
46 -7.9 5.8 0.9 -3.4 4.6 2.6
47 -5.0 -6.6 2.0 6.6 3.1 8.7
48 -5.5 7.7 1.5 2.6 -6.3 -1.7
49 -5.9 -2.9 -8.1 5.6 11.2 9.1
50 -11.6 1.9 5.3 -1.4 5.8 11.8
51 -14.0 5.7 5.5 -0.5 3.3 12.8
52 -11.4 -0.7 9.9 -3.1 5.3 -12.8


Finally, to calculate the weekly percent above normal, we took the ratio of the observed death counts compare to the mean from 2015-2019. Once again, this treats 2015 through 2019 as ‘normal’ years, and 2020 as unusual. Table 8 gives an example of the weekly percent above normal for the jurisdiction of Louisiana. To accumulate the percent above normal, we cannot simply accumulate the week-by-week percentages (as we do for excess deaths). Instead we must compute the cumulative deaths in each year and compare them to the 2015-2019 average cumulative deaths by the same week.

PercentAbove<-state.deaths(state="Louisiana",plot="PercentAbove")
PercentAbove<-cbind(Week=1:52,PercentAbove)
knitr::kable(round(PercentAbove,1),
    caption="**Table 8**: Weekly percent above normal deaths for Louisiana.",
    align="r",format = "html", table.attr = "style='width:40%;'",
    format.args=list(big.mark=","))
Table 8: Weekly percent above normal deaths for Louisiana.
Week 2015 2016 2017 2018 2019 2020
1 1.6 -8.7 -5.8 19.3 -6.5 -3.9
2 5.6 -14.1 -3.0 16.6 -5.0 -0.3
3 -3.2 -9.1 -7.5 18.7 1.1 -1.3
4 -0.6 -2.6 -6.8 9.2 0.9 3.2
5 -3.8 -5.4 0.1 7.6 1.4 4.3
6 -1.1 -5.2 -3.6 9.2 0.8 0.5
7 3.2 -5.2 -5.3 6.1 1.3 3.0
8 0.7 -5.7 5.1 -3.4 3.3 3.8
9 1.8 0.7 5.4 -10.7 2.8 12.0
10 -4.9 5.6 3.5 -5.2 0.9 9.3
11 0.1 -7.9 2.8 -0.3 5.3 11.5
12 -3.8 -1.7 4.9 -3.1 3.6 6.5
13 -1.6 -1.4 4.5 -6.0 4.6 38.3
14 -4.6 -2.7 6.7 0.0 0.6 56.2
15 1.2 2.5 -1.1 -2.5 -0.1 78.6
16 -3.5 2.2 -1.1 2.8 -0.5 60.9
17 -9.2 -0.5 1.4 3.9 4.5 42.4
18 -1.6 -2.3 -0.1 0.9 3.1 39.6
19 -6.4 0.0 1.8 3.3 1.3 30.4
20 -6.3 1.6 2.7 4.3 -2.3 30.7
21 -3.8 -0.8 0.4 7.6 -3.4 27.2
22 -2.1 -7.4 5.1 -0.5 4.8 26.4
23 -4.5 -2.5 -1.7 2.1 6.6 17.4
24 -4.1 1.3 -2.7 -0.2 5.7 15.3
25 -2.1 -1.9 0.2 0.0 3.8 24.4
26 1.0 -2.7 -1.3 2.7 0.4 19.6
27 -5.4 0.8 -1.1 2.1 3.5 20.3
28 1.4 -4.4 -3.5 2.5 4.0 26.4
29 2.0 0.8 2.0 -3.1 -1.6 37.4
30 -3.0 -0.5 1.3 3.2 -1.0 47.9
31 -5.0 -0.6 2.8 4.1 -1.3 47.0
32 -4.7 -0.5 1.3 0.8 3.0 50.0
33 -10.4 5.5 2.3 -0.3 2.9 41.1
34 0.3 -1.4 0.1 3.6 -2.6 34.1
35 -1.7 -1.1 -4.1 -2.5 9.4 32.0
36 -0.9 -0.7 2.3 0.4 -1.1 34.2
37 -6.5 1.0 5.7 -1.4 1.1 17.0
38 0.7 -2.9 -1.9 0.6 3.6 19.7
39 -6.3 5.8 -2.1 2.8 -0.1 21.6
40 -0.4 8.3 0.8 -3.7 -5.0 20.9
41 -3.9 1.0 2.2 -4.1 4.8 20.8
42 -2.8 -4.5 4.9 -4.9 7.3 12.2
43 -5.8 2.5 -0.7 0.7 3.3 14.1
44 -4.8 -3.5 6.8 3.9 -2.4 4.3
45 -10.4 -6.2 5.7 1.3 9.6 8.2
46 -7.1 0.4 0.5 -2.2 8.3 7.0
47 -2.1 -5.4 2.1 2.8 2.6 4.2
48 -6.7 -1.0 6.1 2.5 -0.9 -2.1
49 -6.0 -10.0 3.9 4.4 7.8 -11.6
50 -6.7 -5.3 4.7 6.9 0.4 -20.8
51 -7.8 -2.7 8.2 -1.7 4.1 -49.4
52 -7.5 -8.0 19.7 -2.2 -2.1 -64.9


Figure 12 shows the weekly and cumulative percent deaths above normal for the U.S. state of Michigan.

state.deaths(state="Michigan",data=Data,
    las=1,cex.axis=0.8,plot="percent above normal")
**Figure 12**: Weekly (a) and cumulative (b) % deaths above normal for the U.S. state of Michigan.

Figure 12: Weekly (a) and cumulative (b) % deaths above normal for the U.S. state of Michigan.

We calculated deaths above normal by age in exactly the same way, but in this case since we lacked detailed demographic information by state (i.e., age-stratified population sizes), we show only excess deaths and percent deaths above normal. Figure 13 gives an example for persons aged 25-44 from the U.S. state of Florida.

age.deaths(state="Florida",
    age.group="25-44 years",
    data=Data,
    las=1,cex.axis=0.8,plot="raw & percent above normal")
**Figure 13**: Weekly (a) and cumulative percent (b) deaths above normal for adults aged 25-44 in Florida during 2020.

Figure 13: Weekly (a) and cumulative percent (b) deaths above normal for adults aged 25-44 in Florida during 2020.

This analysis shows an approximately 30% above normal death rate in 2020 compared to 2015-2019 for adults 25-44 years of age in Florida.

Other analyses

Here, we’ve focused on functionality of the covid19.Explorer R package that can be accessed directly through the web portal; however, it’s also possible to use the package in other, more creative ways. For example, most of the package functions invisibly return the results of model-fitting to the user. Let’s say we want to visualize the per capita estimated new infection rate per state, we can estimate infections (as we have done above), and then use the R package maps to graph these new infections onto a geographic state map of the U.S. The result of this exercise (converted into a .gif file that runs from the beginning of the pandemic until now) is shown below.

states<-c("Alabama","Arizona","Arkansas","California","Colorado",
    "Connecticut","Delaware","Florida","Georgia","Idaho","Illinois",
    "Indiana","Iowa","Kansas","Kentucky","Louisiana",
    "Maine","Maryland","Massachusetts","Michigan","Minnesota",
    "Mississippi","Missouri","Montana","Nebraska","Nevada",
    "New Hampshire","New Jersey","New Mexico","New York (excluding NYC)",
    "New York City","North Carolina","North Dakota","Ohio","Oklahoma","Oregon",
    "Pennsylvania","Rhode Island","South Carolina","South Dakota",
    "Tennessee","Texas","Utah","Vermont","Virginia",
    "Washington","West Virginia","Wisconsin","Wyoming")
obj<-sapply(states,infection.estimator,data=Data,
    ifr=c(0.015,0.01,0.007,0.006,0.006,0.006,0.006),
    plot=FALSE)
ii<-grep("New York",colnames(obj))
dates<-seq(from=as.Date("2020/1/1"),to=as.Date("2021/4/20"),
    by=1)
rownames(obj)<-as.character(dates)
obj<-cbind(obj[,-ii],rowSums(obj[,ii]))
colnames(obj)[ncol(obj)]<-"New York"
pop<-as.matrix(state.deaths(data=Data,plot="States"))[,"2020"]
pop["New York"]<-pop["New York"]+pop["New York City"]
pop<-matrix(rep(pop[colnames(obj)],nrow(obj)),nrow(obj),ncol(obj),
    byrow=TRUE,dimnames=dimnames(obj))/1000000
daily<-obj/pop
cols<-rgb(colorRamp(c("blue","red"))(seq(0,1,length.out=100)),
    maxColorValue=255)
nticks<-10
for(i in 1:nrow(obj)){
    infections<-daily[i,]
    colors=setNames(
        rgb(colorRamp(c("blue","red"))(infections/max(infections+10)),
        maxColorValue=255),
        names(infections))
    dev.hold()
    par(mar=c(0.1,0.1,0.1,0.1))
    plot(NA,xlim=c(-125,-60),ylim=c(24,50),asp=1.3,
        xlab="",ylab="",
        axes=FALSE)
    for(j in 1:length(colors))
        maps::map("state",regions=names(colors)[j],
            fill=TRUE,add=TRUE,
            col=colors[j],border="white")
    LWD<-diff(par()$usr[1:2])/dev.size("px")[1]
    Y<-cbind(seq(24,50,length.out=nticks),
        seq(24,50,length.out=nticks))
    X<-cbind(rep(-63+LWD*10/2,nticks),
        rep(-63+LWD*10/2+0.5,nticks))
    for(k in 1:nrow(Y)) lines(X[k,],Y[k,])
    phytools::add.color.bar(50-24,cols,title="",lims=NULL,
        digits=2,direction="upwards",subtitle="",lwd=15,
        x=-63,y=24,
        prompt=FALSE)
    text(x=-65,y=37,"estimated new infections / 1M",srt=90)
    for(k in 1:nticks){
        text(x=X[k,2],y=Y[k,2],round(seq(0,max(infections+10),
            length.out=nticks))[k],pos=4,
            cex=if(k==nticks) 1.2 else 0.7)
    }
    text(x=-118,y=25,
        rownames(obj)[i],
        font.main=3,cex=1.5)
    dev.flush()
}

Figure 14: Animation of distribution of estimated infections / 1M population by state.

Contact & other information

The R package covid19.Explorer and the website covid19-explorer.org were developed by Dr. Liam Revell. Please contact Liam with questions about this page, the covid19-explorer.org website or the R package.



← back to to covid19-explorer.org